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Abstract 

A two-stage approach is proposed to overcome the problem in quantile regression, 
where separately fitted curves for several quantiles may cross. The standard Bayesian 
quantile regression model is applied in the first stage, followed by a Gaussian pro¬ 
cess regression adjustment, which monotonizes the quantile function whilst borrowing 
strength from nearby quantiles. The two stage approach is computationally efficient, 
and more general than existing techniques. The method is shown to be competitive 
with alternative approaches via its performance in simulated examples. 
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1 Introduction 


In many applications, interest lies in describing the effect of a set of covariates at the tail 
of the response distribution, which can be considerably different from their impact at the 


mean (Koenker 2005). The need to have a wider picture of the conditional distribution of 
the response variable is the main reason for the origin and popularity of quantile regression 
methods. These models have been applied to many areas, including the environmental 
sciences; medicine; engineering and economics. Often in the context of risk assessment, 
where the tail of the distribution plays an important role. In many cases, quantile estimates 
at several different quantile levels are needed, so when estimation for each level is carried out 
separately, the monotonicity of the conditional quantiles can be violated, giving rise to the 
phenomenon of crossing quantile regression curves. This leads to difficulties for inference, 
since by definition, the conditional quantile function should be monotonically increasing. 
Figure [I] shows the estimated conditional quantile function using the Immunoglobulin-G 


(IgG) data set of Isaacs et ah (1983), with age as the regressor. The figure shows the quantile 
function at age 6. The dashed line shows the estimates obtained from separately fitting 
the quantile regression at several levels, and it is evident that the curve is not monotone, 
particularly at the extremal levels. The solid line demonstrates the correction obtained using 
regression adjustment for monotonicity proposed later in this article. 

Quantile regression models were introduced in a seminal paper by Koenker & Bassett 


(1978). By writing the sample quantiles as an optimisation problem, and generalising to the 
linear regression model, they proposed minimising the loss function 


i =1 

where p T is the check function p T {u) = u(t — I (u < 0)). The solution to this minimisation 
yields the r-th quantile regression estimate. Most of frequentist literature in quantile re¬ 
gression is based on this estimator, which does not make probabilistic assumptions for the 
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Figure 1: Estimated conditional quantile function of serum concentration of IgG for children 
at age 6, using standard Bayesian quantile regression (dashed line, BQR) and regression 
adjusted estimates (solid line, GPR). 


response variable. Consequently, inference for the parameters of interest relies on its asymp¬ 


totic properties (see Koenker (2005) and references therein). In the Bayesian framework, 


motivated by its equivalence to the minimization problem, Yu & Moyeed (2001) proposed 
the use of the asymmetric Laplace distribution (ALD) as an approximation to the likelihood. 
The approach is appealing since even when the true distribution of the data was not ALD, 


empirical results were satisfactory. Sriram et ah (2013) showed that under some mild con¬ 
ditions posterior consistency can be established for the linear quantile regression estimates 
based on the ALD, providing some theoretical support for the model. 

A drawback of the above approaches is that, as quantiles are fitted separately, the con¬ 
ditional quantile curves are not smooth and the fitted regression lines may cross, which 
violates the basic probabilistic rule and causes problems for inference in practice. To over¬ 


come crossing, He (1997) restricted the space of possible solutions of the response distribution 


to location-scale changes of a base distribution to get noncrossing curves. Yu & Jones (1998) 


and Hall et al. (1999), among others, proposed to estimate the conditional distribution func¬ 
tion nonparametrically. Dette fe Volgushev|(2008) and Chernozhukov et al. (2009) proposed 
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to monotonize an estimated conditional distribution function and then invert it to obtain the 


quantiles. A clever approach was presented by Bonded et al. (2010) to simultaneously esti¬ 
mate several quantile levels by solving the constrained minimisation problem. They showed 
that in the linear case, the number of constraints can be greatly reduced, leading to an 
efficient estimation algorithm. The method is extendable to linear splines. 

Most solutions to noncrossing quantiles in the Bayesian literature proceed by simultane¬ 
ously fitting several quantiles. In the case of linear quantile regression, including spatially 


correlated data, Reich et al. (2011) were able to simplify the noncrossing constraints by writ¬ 
ing the quantile process using Bernstein basis polynomials. Nevertheless, the likelihood does 
not have a closed form and for moderately sized data sets the proposed method is infeasible, 
so an adjustment to the classical estimates is also suggested. For linear regression with a 


single predictor, or a single index model, Tokdar & Kadane (2012) suggested a reparametriza- 
tion of the quantile function that induces monotonicity. Recently, an ingenious solution for 


linear quantile regression was presented by Reich & Smith (2013), where the quantile func¬ 
tion is modelled piece-wise using the linear heteroscedastic model. While many of these 
approaches work well in certain instances, they lack flexibility for more complex situations. 


Alternative Bayesian nonparametric methods have also been proposed (see Scaccia & Green 


(2003) and Taddy & Kottas (2010), among others). They specify flexible error distributions 
using Bayesian nonparametric techniques, but model simplicity is generally compromised. 


As an alternative, D un son & Taylor (2005), and similarly Lancaster & Jun (2010), proposed 
a substitution likelihood for approximate simultaneous quantile inference. 

In this article, we adopt a fully Bayesian approach, thus inference does not rely on asymp¬ 
totic results, which can be particularly delicate in the quantile crossing context where sample 
sizes may not be big enough. We propose a two-stage approach for the simultaneous estima¬ 
tion of quantile regression at multiple levels. The first stage uses standard Bayesian quantile 
regression with ALD, fitted separately at different quantile levels. This takes advantage of 
the flexibility of modelling afforded by fitting quantiles for a specific level. These initial 
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estimates are then adjusted by borrowing strength across nearby quantiles using Gaussian 
process regression in the second stage. The noncrossing constraints are controlled through 
a single parameter and no MCMC is needed in the second stage, resulting in a simple and 
efficient algorithm. The advantages of our approach are that our proposed method produces 
smoother estimates of the conditional quantile functions (see Figure [Tj) , and can handle 
complex modelling situations through the use of ALD in the first stage inference. 

The rest of the article is organized as follows. Section 2 introduces the two-stage model 
for adjusted Bayesian quantile regression, along with its properties and estimation proce¬ 
dure. Simulations are performed in Section 3 to extensively compare the performance of the 
proposed method with the best current solutions. In Section 4, we analyse two real examples, 
the famous data set of serum concentration of immunoglobulin-G for young children and the 
global mean sea level time series. The final section offers some concluding discussions. 

2 Bayesian Quantile Regression Adjustment 

This section addresses the two-stage approach for noncrossing Bayesian quantile regression. 
We first present standard Bayesian quantile regression with ALD in Section 2.1, and then 
describe the second stage Gaussian process regression adjustment in Section 2.2. 

2.1 First stage: Standard Bayesian quantile regression 

Let y = (?/i,... ,y n ) be a vector of observed random variables from an unknown true distri¬ 
bution /. Associated with each observation, we have a fc-dimensional vector of covariates 
Xj e M fc . For quantile level 0 < r < 1, the r-th quantile regression model is given by 

Zh|Xj = MX*) + i = 1, • ■ ■ , n (1) 

where e* are independent from a distribution whose r-th quantile equals to 0, i.e. P(e; < 
0) = r, and h T is an arbitrary function of the covariates. 
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The standard Bayesian estimation procedure is employed in the first stage. Here we con¬ 
sider the use of the auxiliary likelihood given by the ALD(p) distribution as an approximation 
to the true likelihood where 


L a (y\l*,<r,p) 


p n (l-p) n 

a n 


exp 



( 2 ) 


where the mean /q = /i T (X,;) and scale parameter cr are estimated. The value p corresponds 
to the value of the quantile. If we are interested in the rth quantile, the ALD(p = r) is 
fitted to provide a good approximation for the true quantile function Q{r\x) locally at r. 
For several quantile levels, t — pi,.. .pp, the procedure is carried out for each value of r 
using their respective ALD distributions. It is in this last procedure that the phenomenon 
of crossing quantile curves are often observed, as the curves are fitted independently, using 
different likelihood functions. 

Posterior estimation of unknown parameters for a given r is typically obtained via 
MCMC methods. For the r-th conditional quantile under auxiliary model ALD(p = r), 
let <3 ^(t|X,p — r),t — 1, ...,T, denotes the t-th posterior quantile estimate given by the 
MCMC samples. Following Bayesian model averaging (BMA) approach, the r-th quantile 
point estimate is given by 

1 T 

Q s (r\x) = -^Q w (r|X,p = r), (3) 

t =i 

where the index s denotes the standard estimate. The standard formulation of the first stage 
encompasses any model written as © and ([2]), our simulations examples later will consider 
both linear and non-linear regression models. 
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2.2 Second stage: Gaussian process regression adjustment 

Standard quantile estimate ([3]) is computed based only on the MCMC samples from the 
auxiliary model ALD(p = r). However, as adjacent quantiles are correlated, it is expected 
that other auxiliary models ALD(p = t') will contain useful information for r if r' is nearby. 
Since in the first stage we have fitted P auxiliary models ALD(p), p = pi,...,pp, we can 
calculate the induced r-th quantile posterior sample for any given auxiliary model from 


e<‘>(T|x, P ) = F-‘(T;yv < ‘ ) ,p) = < 


y<> + 


(t) 


T 


log - , if 0 <r <p 
1 ~P \P, 




(t) _ 


a 


(t) 


-log 

p 


1 — T 
1 — p 


( 4 ) 


, if p < r < 1 


which is the quantile function of the fitted auxiliary model ALD(//^, a^\p) (Yu & Zhang 


2005). Equation Q provides us with (P—l)xT additional posterior samples for the quantile 
at r (assuming that one of the pi,.. .pp values equals to r). A smoother noncrossing quan¬ 
tile estimate can then be obtained by borrowing strength from these induced r-th quantile 


posterior samples using Gaussian process regression on all PxT estimates (Rasmussen & 

Q (<) (r|X,p) = g(p) +e 


Williams, 2006), 


e ~AT(0,E) 

g(p) ~ GV(o,k), 


( 5 ) 


where E and K are covariance matrices with dimension (PxT, PxT). More specifically, E 
is a diagonal covariance matrix whose diagonal entries are the posterior variances of the cor¬ 
responding <3 ^(t|X,p), denoted by cr 2 (r|X,p). Independence is assumed between samples 
of the induced models as these were obtained independently, and approximate independence 
within each model is also attained by taking every mth MCMC sample. Moreover, as auxil¬ 
iary models ALD(p), for p close to r, carry more information about the r-th quantile than 
more distant ones, we build the Gaussian process covariance matrix as a decreasing function 
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of the distance between the models. Hence, using the squared exponential kernel, covariance 
matrix entries between induced quantiles from any two auxiliary models are given by 


k(p,p) 


a\ exp 



{P-P'f j, 


( 6 ) 


where b is the bandwidth and a\ is a variance hyperparameter of the prior. As we are 
centering the prior arbitrarily on zero, cr| should be large to result in an uninformative prior. 
However, choosing values too big may lead to computational issues while inverting matrices. 
Throughout the article, we adopted a\ = 100, results were not sensitive to any moderately 
large value of a\. 

The Gaussian process prior over the posterior mean function g(p) allows us to model 
nonparametrically the embedded correlation structure, using information from other auxil¬ 
iary models to adjust the standard posterior mean ([3]). The final r-th quantile estimate will 
then be the adjusted posterior mean for standard auxiliary model ALD(p = r), denoted by 


Q a (r\x), which is very simple to obtain as closed form predictive posterior distribution is 


available (Rasmussen & Williams, 2006), 


Q*(r|X,p*=r, Q w ) ~ A7(/i*, a;), 


where 


p T 

P* = Q a (r\x) = ^^w p Q a) (r|X,p), 

p= 1 t= 1 

= k{r,r) - WK(.,r ) + a 2 (r\X,p = r), 
W = K(.,t) t (K AS)" 1 , 


(7a) 

(7b) 

(7c) 


also K(.,t ) is a covariance matrix column where p' — t and w p is an element of the row 
vector of weights W € M PT . 





Regardless of the number of quantiles being estimated and their initial values, noncrossing 
quantile estimates can always be obtained from this Gaussian process regression adjustment, 
as stated in Proposition [[} 


Proposition 1. Let Qo.(t\x) be the adjusted r-th quantile estimate given in (7a). Then 
for any set of quantiles iq < ... < r P , P e 7L\, there always exist a bandwidth b such that 
Qa{n\x) < ... < Q a (r P \x). 


Proof. As b —> oo, weights are all equal and the adjusted r-th quantile estimate (7a) can be 
written as 




Qa(r\x) 


1 

PxT 


EE« (,) w x .p)- 


p= 1 t— 1 


Being a simple average of nondecreasing quantile functions Q^(r|X,p), the final quantile 
function is also nondecreasing for b — y oo. In fact, for the same reason, monotonicity holds 
when the weight function is constant for the interval 0 < p < 1 , which is generally achieved 
for moderately large b. .CJ 


Although the monotonicity guarantee presented in Proposition [l] is important, in general, 
we do not need to restrict the quantile function to this limiting case to get rid of the crossing. 
In practice, a small bandwidth b already adds enough smoothness to prevent crossing and 
is preferable in order to keep the weights concentrated on the target model. In fact, from 
Equations (|6|) and (7c), it is easy to see that, if the bandwidth b goes to zero, the weights 


are nonzero only for the target model (p = r) and Qa{x\x) = Q s (t\x), which is the standard 
posterior mean. Therefore, the bandwidth, which is the only unknown parameter of Gaussian 
process model (|5]) , will be estimated as the minimum value of b that ensures noncrossing 
everywhere (for all r and X). Consequently, troublesome noncrossing constraints are handled 
easily here by means of a single smoothing parameter. Moreover, posterior consistency holds 
if the initial model is consistent, this is stated in Proposition [2] 
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Proposition 2. Let Q s {r\x) be the standard r-th quantile estimate pi) and Q a {r\x) the final 


estimate (7a). Then, if Q s (r\x) is consistent, Q a (r\x) is also consistent. 


Proof. If Q s {t\x) is consistent, it does not cross as sample size goes to infinity. Therefore, 
asymptotically, the estimated bandwidth b goes to zero and the final estimate reduces to the 
initial one. □ 


Estimation of regression model ([5]) can be further simplified as the weights given in (7c) do 
not depend on iteration number t. Therefore, instead of using all MCMC posterior samples 
Q^\t |X, x), the same final estimates can be obtained by fitting a Gaussian process only 
to the induced posterior means, avoiding Gaussian process issues with large data sets. The 
result is presented in Proposition [3] (see Appendix for proof). 


Proposition 3. Let Q s (r|X,p) be the posterior mean of induced r-th quantile from auxiliary 
model ALD(p), 

1 T 

Q,(r|X,p) = -^Q«‘)(r|X,p). (8) 

t= 1 


Then adjusted posterior mean (7a) and variance (7b) can be obtained by fitting a Gaussian 


process only to the induced posterior means Q s (r\X,p), i.e. 


Qs(r\X,p) = g\p) + e' 

e' ~ M{ 0, E') (9) 

g\p)~gV(f),K'), 

where S' is a (P, P) diagonal covariance matrix whose diagonal entries are cr 2 (r|X,p)/T, 
since independence is assumed, and K' is a (P, P) covariance matrix whose entries are 
calculated as in (|6]) . More specially, if we let g'{p*=T) ~ A/"(/d, a' 2 ) be the predictive posterior 
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distribution for the mean function of model (|9|), then the equivalence is as follow 


p 


h* = h' = ^w' p Q s (t\X,p), 
p= i 

04 = (T /2 + (T 2 (r|X,p = r), 


( 10 ) 


where a' 2 = kfr, r) - W'K'(., r) and W' = K'{., r) T (/C ' + S') -1 . 

Hence, the final quantile estimate is a weighted average of the induced quantile posterior 


means (10). Furthermore, the posterior variance (10) suffers no substantial change, as the 
additional term a' 2 is the predictive posterior mean variance, being in the order of magnitude 
of the Monte Carlo variance (cx /2 ~ u 2 /T), which decreases to zero at the rate 1/T. 

We illustrate the estimation process in Figure [2j We consider estimating the conditional 
deciles for a given data set. After fitting standard Bayesian quantile regression in the first 
stage, if we look at a given X, we have a set of estimated conditional quantile functions, one 


for each auxiliary model. Focusing on the 0.4-th conditional quantile, Figure 2a shows the 
0.4-th induced quantile posterior means Q s (0.4|X,p). Borrowing strength is then achieved 
by fitting a Gaussian process regression with these induced posterior quantiles (Figure [2b]) , 
where the bandwidth is chosen as the minimum value such that noncrossing holds everywhere 
(for all r and X). A general algorithm can be summarized as follows: 

1. Fit P separate ALD(p), p = p 1 , ...,pp, (Equation [2]); 

2. Calculate induced quantile posterior means Q s (r|X,p) for all X and r — pi,... ,p P 


3. Initialize b ~ 0 and while quantile estimates cross, increase b and calculate regression 


adjusted quantile estimates (10) for every X and r — pi,... ,pp. 


The performance of the regression adjusted quantile estimator will be studied on simu¬ 
lated and real data examples in the next sections. 
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(a) First stage: ALD auxiliary models 


(b) Second stage: Gaussian process regression 


Figure 2: Two-stage approach illustration: 0.4-th quantile estimation for a given value of X. 


3 Simulation study 

3.1 Linear quantile regression 


The proposed two-stage quantile model is compared to two noncrossing linear quantile re¬ 


gression methods, the constrained minimization approach of Bonded et al. (2010) and the 


semiparametric Bayesian model of Reich & Smith (2013). Codes for the first are available 


from the author’s web page, whereas the second is implemented in BSquare package (Smith 


& Reich, 2013) in R (|R Core Team, 2014). Our first stage estimation is carried out using 


the bayesQR package (Benoit et ah, 2014) from R. Default uninformative priors were used 


throughout. We considered four simulation designs studied by Reich & Smith (2013): 


Design 1. ft(r) = log[r/(l - r)], ft(r) = 2; 

Design 2. ft(r) = sign(0.5 — r) log (1 — 2 10.5 — r|), ft(r) = 2r; 
Design 3. ft(r) = $ -1 (t), ft(r) = 2 min {r — 0.5,0}; 
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Design 4. /3 0 (t) = 2<F V), ft(r) = 2min{r - 0.5,0}, ft(r) = 2r, ft(r) = 2, ft(r) = 1, 

ftO) = 0; 


For each design, we generate Ui ~ Unif(0,1), i = 1,..., 100, and generate the jth covari¬ 


ate Xij ~ Unif(—1,1). Y t — Po(Ui) + V ■ Xij/3j(Ui). Quantiles r = 0.05, 0.06,..., 0.95 were 


fitted to the data. For the Reich & Smith (2013) method, logistic base distribution was used 


with 4 basis functions. For our first stage estimation, we fitted standard Bayesian quantile 
regression with 31500 MCMC draws, thinning m = 30 and burn-in of 1500. 

To compare the methods, 500 data sets were simulated and the empirical root mean 


integrated squared error, RMISE = y 1/n Y^=i {Q( T \^i) ~ Q( r \Xi)} 2 , was computed for 
each data set and r. Crossing occurred in 405, 484, 480 and 500 out of 500 data sets in the 
four designs. Figure [3] presents the average RMISE for all designs and investigated methods. 
The first design is very peculiar because ft does not vary with r. For this setting, the 


method of Reich & Smith (2013) is significantly better than the others for the extreme quan¬ 


tiles, whereas at the interquartile range this difference is not so significant. Standard error 
plots of average RMISE were omitted, but values for all methods range from 0.6 (around the 
interquartile range) to 1.5 at the tails. This performance is expected as Design 1 satisfies 


the assumptions of the Reich & Smith (2013) model and, naturally, the correct model speci¬ 


fication adds valuable information, especially where data availability is scarce. On the other 
hand, for Designs 2 and 3, where quantile regression is more appealing given that parame¬ 


ters are varying with r, the regression adjustment and Bondcll et ah (2010) methods have 


similar performance, both outperforming Reich & Smith (2013) almost everywhere. For the 


multivariate scenario brought by Design 4, the proposed method has smallest RMISE than 
the others for most quantile levels and this difference is significant around the interquartile 
range. Therefore, except for the simple scenario of Design 1, the proposed Gaussian process 
regression adjustment of standard Bayesian quantile estimates performs similar to or better 
than the simultaneous fitting methods of the other two approaches. 
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(a) Design 1 (b) Design 2 




(c) Design 3 (d) Design 4 

Figure 3: Average RMISE (xlOO) over 500 data sets at r = 0.05, 0.06,..., 0.95. 


An initial drawback of the proposed approach is that the Gaussian process regression 
adjustment may jeopardize the linear relationship between the dependent and independent 


variables. Note that the final estimate is a weighted average of the induced quantiles (10), 
whose weights depend on the covariance matrix S' and, consequently, on the covariates X. 
This compromises the linearity, particularly if the bandwidth is large. Nevertheless, using 


14 























an approximation to a constant covariance matrix E' with respect to X proves to perform 

well. 

For a given r and X, the elements of the diagonal covariance matrix S' are the variances 
of the induced quantile posterior means Q s (r|X,p) for different auxiliary models ALD(p). 
Figure [4] shows how these elements vary with p for a sample from Design 3. The different 
grey curves are the covariances E' for each observed X. Notice that the grey curves share 
a similar shape across p for different values of X. Therefore, the covariance matrices S' are 
approximately proportional to the mean covariance across X (black curve), denoted by Eh 
Thus to maintain linearity, we may use E' to approximate the covariance matrix. 



(a) Quantile level r = 0.5 (b) Quantile level r = 0.95 

Figure 4: Variance of induced quantile posterior means Q s (t |X,p) for all values of X (grey 
lines) and the mean variance across X (black line) for a sample from Design 3. 

The average RMISE for the proposed linear Gaussian process regression adjustment 
(LGPR) coincides with the previous GPR results (Figure [3]) for all simulated designs. From 
Figure [5] we can see that the final estimates from LGPR are very similar to the ones obtained 
through GPR even when the bandwidth is very large. Therefore the approximation used for 
linear regression appears to be satisfactory, producing very similar results from the original 
method while holding the linearity needed for parameter interpretation purposes. 
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(a) Median bandwidth (6 = 0.1) (b) Largest bandwidth (6 = 2000) 

Figure 5: Quantile regression estimates using Gaussian process regression adjustment (GPR) 
and linear Gaussian process regression adjustment (LGPR) for two different samples from 
Design 3 at quantile levels r = 0.10,0.16,..., 0.94. 


3.2 Nonparametric quantile regression 


For nonparametric quantile regression, the frequentist approach of Bondcll et ah (2010) for 


fitting linear splines will be considered. However, as their codes are not available for the 
nonparametric case, we will analyse the same simulation designs proposed by them to have 
their results for comparison. Let y t = /(ay) + g(xj)ei be a heteroscedastic error model, then 
Designs 5 and 6 are given by the following choices of mean and covariance functions: 


Design 5. f{x) = 0.5 + 2x + sin(27rx — 0.5), g{x) = 1; 

Design 6. f(x) = 3x, g(x) = 0.5 + 2x + sin(27rx — 0.5); 

Similarly, for each design, i = 1,..., 100 samples were generated, given that ~ 
Unif(0,1) and e* ~ N(0,1). Quantile levels t = 0.05, 0.06,..., 0.95 were estimated. For 
the first stage regression adjustment, cubic splines with 25 equally spaced knots were fitted 
to the data using standard Bayesian quantile regression from bayesQR package (same MCMC 
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configuration as previous simulation designs). Again, we use uninformative priors. Linear 


B-splines with knots at each data point were used by Bondell et ah (2010). We simulated 
500 data sets and all samples presented crossing issues when fitted with standard Bayesian 
quantile regression. Results are presented in Figure [6} 
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(a) Design 5 


(b) Design 6 


Figure 6: Average RMISE (xlOO) over 500 data sets at r = 0.05, 0.06,..., 0.95. Results for 


Bondell’s method for r = 0.5, 0.7, 0.9 are from Bondell et al. (2010). 


For Design 5 and r = 0.5, 0.7, 0.9, the magnitude of the standard errors of average RMISE 
of both methods are below 0.5 (standard error plots omitted), suggesting that regression 
adjustment has significantly smaller RMISE than the constrained minimization approach. 
Whereas for the complex covariance function brought by Design 6, both methods have similar 
performance. Furthermore, as the regression curves are, in general, expected to be smooth, 
cubic splines interpolation is preferable than linear splines, see Figure [7J However, cubic 


splines is not supported by Bondell et al. (2010). 


Therefore, the two stage approach allows for more flexibility to handle any nonparamet- 
ric quantile function in the first stage, and results are better than, or at least similar to, 
competitive approaches. 
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(a) Design 5 


Figure 7: True and estimated conditional 
0.05,0.15,..., 0.95. 

4 Real data applications 

Quantile regression is widely used in medicine 
will be explored in this section through two r 



0.2 0.4 0.6 0.8 

X 


(b) Design 6 

quantile functions for quantile levels r = 


1 environmental sciences. Both applications 
data examples. 


4.1 Immunoglobulin-G data set 


Centile charts are adopted in medicine to establish reference ranges in order to identify 
unusual subjects. However, as interest lies in estimating many quantiles of the response 
distribution, the estimates often cross. In the search for reference ranges to help diag¬ 


nose immunodeficiency in infants, Isaacs et al. (1983) measured the serum concentration of 


immunoglobulin-G (IgG) in 298 preschool children. This famous data set will be used here 
to estimate IgG conditional quantile levels r = 0.005, 0.01,..., 0.995. A quadratic model in 


age is used to fit the data due to the expected smooth change of IgG with age. See [Isaacs 
et al.| ( JI983 ), Yu & Moyeed (2001) and Kottas & Krnjajic (2009). 

Figure [8] presents the results before and after the Gaussian process regression adjustment. 
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For ease of visualization, fewer quantile curves are drawn from r = 0.05 to r = 0.95 (step 
size equals 0.05), whereas at the tails, where most of the crossing occurs, all estimated 
quantiles are plotted. The highlighted curves have quantile levels identified on its right side 
for reference purposes. 




(a) Standard Bayesian quantile regression (b) Gaussian process regression adjustment 
Figure 8: Growth chart of serum concentration of immunoglobulin-G for young children. 


The independent fitting of standard Bayesian quantile regression (Figure 8a) displays 


many crossing curves. Furthermore, the comparison of quantile curve estimate for r = 
0.995 and its neighbour r = 0.99 demonstrates the consequences of not borrowing strength. 
Although both quantile levels are remarkably close, we get very different estimates when 
fitting them separately, which does not look very realistic and leads to the extreme case of 
crossing. The Gaussian process regression adjustment corrects the crossing by borrowing 
information from nearby quantiles (Figure [8b]). Therefore, the final estimates not only 
respect the monotonicity constraint, but are smoother than the initial ones. The estimated 
conditional quantile function for children with 6 years old, previously presented in Figure [lj 
evidences the monotonicity and smoothing effect of the two-stage approach. Hence, better 
quantile estimates are provided here, without compromising the simplicity or flexibility of 
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the standard Bayesian approach. 


4.2 Global mean sea level variation data set 


From satellite radar altimeters measurements, Nerem et al. (2010) constructed a climate 


data record of global mean sea level change (AMSL), which is defined as “the area-weighted 
mean of all of the sea surface height anomalies measured by the altimeter in a single, 10-day 
satellite track repeat cycle”. The data set consists of 762 observations dating from 1992 to 


2014 (for data processing details see Nerem et al. (2010)). 

Sea level rise is closely related to climate change and its study is paramount, specially 
for habitants of coastal and island regions, where extreme events are likely to become more 
frequent. Quantile regression can provide a greater picture of the global mean sea level 
distribution whilst modelling several quantile levels. For this study, we used the regression 
adjustment method to estimate 19 regression curves, r = 0.05, 0.10,..., 0.95. In the first 
stage (Figure [9a]), nonparametric Bayesian quantile regression was fitted using cubic splines. 
Only the first decade is presented for a better visualisation of the crossing issue. We used the 


model described by (Dortet-Bernadet & Fan, 2012), where regression splines of any degree 


can be fitted, with unknown knot number and location, based on the ALD distribution, 
codes were provided by the authors. Alternative methods for spline fitting can also be found 


m 


Chen & Yu (2009) and Thompson et al. (2010). 


From Figure 9a, we notice that, as a consequence of the nonparametric fit, crossing 


becomes much more likely to occur. After adding the monotonicity constraint, the regression 
adjustment provides improved estimates by getting rid of the crossing and adding some nice 
smoothness (Figure [9b| . This study corroborates that sea level is overall increasing since 
1992, although the rate of the change is not constant over the period (plots for the second 
decade were onnnited, but increasing pattern persists and AMSL reaches 70 millimeters 
in 2014). As seasonal variations of the sea level were subtracted from this data set, the 
oscillating inter-annual periods are generally associated with El Nino and La Nina events 
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(a) Standard Bayesian quantile regression (b) Gaussian process regression adjustment 

Figure 9: Quantile regression of global mean sea level variation for r = 0.05, 0.10,..., 0.95. 


(Nerem et al. 2010). Quantile modelling adds important information to the monitoring 


process of global sea levels and ultimately helps to mitigate the impact of extreme events. 


5 Concluding remarks 

This paper presents a two-stage approach to noncrossing Bayesian quantile regression. Fol¬ 


lowing Yu & Moyeed (2001), the asymmetric Laplace distribution is used in the first stage as 


an auxiliary likelihood to fit several quantiles separately. A better exploration of this set of 
auxiliary fittings is undertaken in the second stage, in which the initial estimates are adjusted 
by borrowing strength from nearby quantiles using Gaussian process regression. The proce¬ 
dure is computationally simple as initial quantiles are fitted independently and no MCMC 
is needed in the second stage. Monotonicity constraints are also handled easily through the 
estimation of a single parameter. In addition, final estimates reduces to the initial ones if 
noncrossing exists and posterior consistency is maintained in case it holds initially. 

Simulation studies demonstrate the flexibility of the two-stage approach in modelling 
complex quantile functions, providing in general smaller RMISE than alternative methods. 
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Moreover, it handles both linear and non-linear quantile regression curves. Although an 
approximation to the covariance matrix is needed in the linear case, its performance is very 
satisfactory. The possibility to choose any nonparametric curve to model the data again 
shows the great versatility of the proposed method. Indeed the algorithm can be modified 
to accommodate models other than the ALD in the first stage. Additional smoothness of 
the quantile estimates is another attractive property as illustrated in real data analyses. 

Being an adjustment of initial estimates, the performance of the proposed and standard 
approaches are naturally related. Flexibility and low RMISE, for instance, are great features 
directly incorporated. However, caution is needed in the case where the initial estimates 
perform too badly, which can occur when sample size is small and, consequently, extreme 
quantiles are fitted very poorly. The two-stage approach will still correct the crossing is¬ 
sues, but at a cost of an undesirable high bandwidth. In these circumstances, simultaneous 
quantile estimation should be preferred. 
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6 Appendix 

Proof of Proposition 2. Let q pt be a short notation for observed induced quantile sample 
<5^(r|x,p p ) and q p be the observed induced quantile posterior mean Q s (r|X, p p ). Then, the 
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likelihood function for the model with all MCMC samples (|5]) can be factorized as follows 


p(qpt|g(p P ), 
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Therefore Fisher-Neyman factorization theorem implies that induced quantile posterior means 
T(c\pt) = (qi ,..., qp ) are jointly sufficient for (g(pi ),..., g(pp))- Consequently, the likelihood 
for the model with all MCMC observations is proportional to the likelihood for the model 
with induced quantile posterior means, i.e. p(q P t|g(p p ), & 2 p ) oc p(q p |g(p p ), <t 2 ). Furthermore, 
as the priors for g(p) are the same for both models, then they have the same posterior and 
predictive posterior distributions for g(p). 

Therefore, considering that the predictive posterior distribution for g'{p—r) is M{p!, cr /2 ), 
so is the predictive distribution for g(p—r). Lastly, as Q*(r\x,p=r, q pt ) = g*(p=r)+e, the pre¬ 
dictive posterior distribution for Q(r\x,p) is Q*(t\x,p=t, q pt ) rv_/ o' 2 + <7 2 (t|X,p=t)). 

□ 
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